Click on our Sponsors to help Support SunWorld
Performance Q & A by Adrian Cockcroft

Learn to performance tune your Java programs

Measure Java performance with this new data graphing tool

SunWorld
October  1997
[Next story]
[Table of Contents]
[Search]
Subscribe to SunWorld, it's free!

Abstract
This month, Adrian shows us his new graphing tool for measuring Java performance. The tool can time operations and generate a report to help you pinpoint and analyze performance tie-ups. (2,500 words)


Mail this
article to
a friend

Q How can I find out how to do performance tuning for Java programs?

AI asked around and found a useful Web site with information. I also have a new Java percollator graphing program that I'm tuning that includes a useful class to help measure its performance. I've been trying out the profiler feature of Java WorkShop 2.0 on it as well.

Almost two years ago I figured out Java enough to put together a very simple performance graphing applet. This summer I helped a student intern rewrite and extend it. The result is a much more useful tool that runs as both an applet and as an application, and which browses the files generated by my percollator Web server performance monitor script. When graphing large amounts of data the program can slow down, so code was added to time critical operations and report a summary. Because this is a useful code fragment in itself, that's where I'll start.


Advertisements

The Metrognome class
I wanted a simple object that measured a time interval multiple times, then generated a summary of the results. The name is a joke, as it's a kind of small helpful personalized metronome. The main design idea was to create a separate object that had a simple collection interface but could also be upgraded and extended without changing the interface. That way a program using the Metrognome class could be instrumented in more detail by just replacing the class. No recompilation or interface changes should be necessary.

A Metrognome is constructed by giving it a text label that identifies it.

Metrognome redrawTimer = new Metrognome("Display redraw");

At some point in the code, let's say during the paint method that redraws the display, we call the start method then call the stop method.

paint() {
	redrawTimer.start();
	// lots of display update code
	redrawTimer.stop();
}

The Metrognome is now collecting the duration of all display updates. Somewhere else in the code we need to have a way to show the results. The getSummary method returns a String that can be displayed in a GUI, written on exit to the Java console or whatever else you like. Some individual performance data can also be read from the Metrognome and displayed, but it is easy to update the contents of the String if a more sophisticated Metrognome is used.

The summary string shows the label, count, and min/mean/max times in seconds. For example,

"Display redraw: count= 5  Latency(s) min= 0.017  mean= 0.029  max= 0.041"

More ideas
As usual I have more ideas than time to implement them. Some possibilities for extending the Metrognome class, without changing the interface to the rest of the program, might include:

Java tuning and Java Virtual Machines
One of the nice things about Java is that investment levels are very high. This means that everyone is competing to provide the fastest Java Virtual Machine, and speeds are increasing all the time. This can trap the unwary. Things that are slow in one release can suddenly become fast in the next one. You must be very careful not to write unnatural code just for the sake of short term performance gains. The recent introduction of first-generation Just In Time (JIT) compilers has made loops go much faster, but they do not speed up method invocations very much. As more JIT-based optimizations are perfected more language features will be accelerated. The cost is that the JIT compiles code at runtime, so initial application startup can be slower.

On Solaris, there are several versions of Java, summarized in the table below. Java is upwards compatible, but if you use recent features you cannot go back and run the code on earlier releases. The Java 1.0 specification and its support classes were found to have problems in some areas. Java 1.1 was based on a lot of user feedback and takes code from several vendors, not just Sun, to provide a much more robust and usable standard that should remain stable for much longer than 1.0.

Vendor Browser Version JIT? Java compat
JavaSoft Appletviewer JDK 1.0.2 No 1.0
JavaSoft Appletviewer JDK 1.1.3 Yes 1.0 and 1.1
JavaSoft HotJava 1.0 Yes 1.0 and 1.1
Netscape Navigator 2.0 and 3.0 No 1.0
Netscape Navigator 4.0.2 Yes 1.0 and most 1.1

GPercollator examples
The GUI is shown below with a single set of data (one days worth) loaded showing HTTP rates, running as a program. I set the base of the Y-axis to zero and asked it to integrate the data to get the daily total.


GPercollator GUI

If I append several days of data, clone the sequence twice and pick HTTP operations/second, the peak 5 second rate, and the status code display, it gets a lot busier, and the graph update slows down.

GPercollator performance and size issues
The two operations that we timed using Metrognomes in GPercollator are the graphical display update and the data load from URL operation. At present they seem to run reasonably quickly, but as the amount of data on the display increases the update slows down. One display option is triggered by an "Info" button. This shows credits, the version, and the Metrognome summaries.


More Percollator:
httpop/s x time

The most CPU-intensive operation is loading a new file. Each file contains about 280 rows and 20 to 30 columns of data, mostly numeric. Processing the file is tricky and inefficient, taking up to a second of CPU time. I tried to use the StreamTokenizer, but it doesn't behave the way I want it to when processing numbers. I store the result in a class called a DataFrame that contains a Vector of Vectors. This can cope with the variation in type from one column to another. Some data starts with a numeral, but is really a string, like the local timezone timestamp -- "08:35:20." The StreamTokenizer breaks this into two tokens if you let it interpret the number. We force it to pick off space delimited strings and check that the whole token is a valid number before converting the data using Double(String). This conversion operation is inefficient -- a comment in the code for the Double class points this out. The StreamTokenizer contains its own code for parsing numbers which seems like wasteful duplication, but it is much faster. I'm hoping that the performance of Double(String) will be fixed for me one day.

The other area of concern is the size of the program. It can load many days worth of data and then show up to 16 differently colored plot lines on the graph. The process size on Solaris when run as an application can reach well over 10 MB. The code size is under 50 KB, but it does require the additional Java WorkShop GUI class library visualrt.zip, which is about 600 KB. To save downloading this dynamically each time you start an applet, you may want to grab a copy and put it on your local CLASSPATH. The main memory hog is the array of DataFrames each containing a Vector of Vectors of Doubles or Strings. I'd like to make it a Vector of arrays of doubles where possible but haven't yet figured out how. I did trim the Vectors down to size once I had finished reading the file. I tried to find a tool that could pinpoint the memory usage of each object, but only found people that agreed with me that it would be a good idea to build one.

GPercollator is Java 1.1 based, and it uses the new event model feature, so it doesn't work with older versions of Java. This means that you cannot run it using Netscape Navigator (at least up to version 4.02) or Microsoft Internet Explorer. It does work in HotJava 1.0 (the version provided with Solaris 2.6) and the JDK 1.1.3 appletviewer also provided with Solaris 2.6. It also runs as a program from the command line using JDK 1.1.3. By default Java WorkShop builds in the code wrappers that allow generated programs to be run as applets and as applications without any changes. As full Java 1.1 support rolls out over the next few months it will be easier to use it as an applet.

The Java WorkShop performance analyzer
Graham Hazel wrote the new graphical percollator browser (GPercollator) with a lot of feedback and a few code fixes from me (see the Resources section below for links to run it and to get a tar file of the code). We built GPercollator using Java WorkShop 2.0 as the development tool and GUI builder. One feature of Java WorkShop is that it provides a simple menu option that starts the program or applet along with a performance profiler. After the program exits, the profile is loaded, and you can see which methods took longest to run. You can also see and traverse the call hierarchy. When we first tried this, our top routine was an iso8859 character conversion method. Initially we didn't see it because the profiler only shows your own code. When we looked at the system library code as well we could see the problem. When we tracked it down we realized that we were processing the input data without buffering it first. This is a common mistake, and when we wrapped a buffer around the input stream it went a lot faster, and that routine dropped way down the list. We also compiled the application with debug turned on to start with, and when we changed to invoke the optimizer, the individual class sizes dropped very significantly. As a result, we got a reasonable speedup.

Profiler display examples
I compiled the code with debug and used the old style input stream methods. This is deprecated in Java 1.1, but is based on old code I wrote using Java 1.0. I started it up as an applet from Java WorkShop using the profiler button. The tool automatically loaded a data file, and I reloaded it another four times so that the load time would dominate the tool startup time. The initial profile shown does not include system routines.

Image 1:
Debug profile
without system
routines, times
in ms
Image 2:
Debug profile
with system
routines, times
in ms
Image 3:
Debug profile
times with
Java 1.1
InputStreamReader
Image 4:
Debug profile
times with
Java 1.1
BufferedReader
Image 5:
Optimized
profile times

Image 2
When the system routines are shown, the top one is the idle time routine Object.wait. Next comes the stream tokenizer using about 15 seconds of CPU. The first routine of my own code, DataFrame.fetch, is about 1.5 seconds. Input goes via a BufferedInputStream.

Image 3
The code is now brought up to date by adding an InputStreamReader between the input stream and the StreamTokenizer rather than a BufferedInputStream.

InputStreamReader is = new InputStreamReader(url.openStream());
StreamTokenizer st = new StreamTokenizer(is);

This is part of the improved internationalization of Java 1.1. Spot the deliberate mistake. There is now about 200 seconds of overhead with 104 seconds in ByteToChar8859_1.convert on its own. It needs a buffer!

Image 4
This increases the size of the chunk of data being processed by each method invocation, thus reducing the overall overhead. The new code wraps a BufferedReader around the input.

BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));
StreamTokenizer st = new StreamTokenizer(br);

This reduces the overhead to about the same level as the original code.

Image 5
The next step is to turn off the debug compilation flag and turn on the optimizer. The total size of the compiled classes is 54 KB compiled with debug and 46 KB when compiled with -O.

Running without the profiler with debug code, the average time taken to do a load operation, as measured by metrognome, is 0.775 seconds. This is a lot faster than the profiled time of 5.47 seconds, so there is quite a large profiler overhead. When the code was optimized overall performance did not increase much, but because most of the time is spent in system routines this is not really surprising. If we look at the profile of the optimized code, excluding system time, the DataFrame.fetch routine is a lot faster, but that only amounts to a reduction from 1.5 seconds to 1.0 seconds, as the total CPU for five fetch operations. To tune the code further I need to work on making more efficient use of the system functions. Here is the optimized profile for my own code.

Wrap up
So far I have managed to build a program that is fast enough to be useful, but bigger than I would like. Its a useful test bed for me as I learn more about tuning Java code to make it smaller and faster. I was given a useful URL for more information on Java tuning tips: Java Optimization. The subject is not covered much in the many books on Java, but I'll include whatever I can figure out in future columns and the updated version of my own book. Any hints and tips (also GPercollator bugs and fixes) from more experienced Java programmers are welcome.


Click on our Sponsors to help Support SunWorld


Resources

Other Cockcroft columns at www.sun.com

About the author
Adrian Cockcroft joined Sun in 1988, and currently works as a performance specialist for the Server Division of SMCC. He wrote Sun Performance and Tuning: SPARC and Solaris, published by SunSoft Press PTR Prentice Hall. Reach Adrian at adrian.cockcroft@sunworld.com.

What did you think of this article?
-Very worth reading
-Worth reading
-Not worth reading
-Too long
-Just right
-Too short
-Too technical
-Just right
-Not technical enough
 
 
 
    

SunWorld
[Table of Contents]
Subscribe to SunWorld, it's free!
[Search]
Feedback
[Next story]
Sun's Site

[(c) Copyright  Web Publishing Inc., and IDG Communication company]

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-10-1997/swol-10-perf.html
Last modified: